Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: Administrator, Curator, or StaffNemo_bis Date: Aug 2, 2014 8:06am
Forum: texts Subject: Dealing with '-fast' OCR

What's the best course of action when seeing "Doing '-fast' OCR, due to high load on OCR nodes"?

For instance the following jobs, lasted 2.5, 1.9 and 2.3 h: https://catalogd.archive.org/log/326385278 https://catalogd.archive.org/log/326488032 https://catalogd.archive.org/log/326150110 (JP2 images of 290, 180, 190 MB) vs. 8.8 h of a similar non-fast one https://catalogd.archive.org/log/326486265 (430 MB JP2).

Fast OCR is still wonderful, and hopefully a fast OCR of better images is still better than a non-fast OCR of low resolution images (as https://catalogd.archive.org/log/322708450 https://catalogd.archive.org/log/325437296 https://catalogd.archive.org/log/324169651 https://catalogd.archive.org/log/325669018 ). But if/when someone wanted to manually proofread the OCR, it would be silly to waste manual time for needless corrections.

How better is the non-fast OCR? Should one hope for a non-fast OCR, e.g. submitting only when servers are not overloaded? (Is there even a way to do that, or to lower priority of one's own jobs?) Or should one just ask IA staff to rerun derive when a precise OCR is especially needed?

Terms of Use (10 Mar 2001)